Assignment 1: Stanford Future Bay Initiative

Create bar or line charts showing monthly total kBTUs of residential and commercial electricity and gas consumption for the Bay Area (meaning the sum of all ZIP codes in the 9 Bay Area counties) from 2017 to the latest available month (meaning a 42-column version of the plots from section 1.8). Look online for the correct conversion of kWhs to kBTUs and therms to kBTUs. Make sure that electricity and gas data are distinguishable but plotted in the same chart; feel free to separate your analyses for residential and commercial into two separate charts if you believe it improves legibility, or do one chart with 4 colors in the legend. Comment on any observable changes in energy consumption that may be attributable to the COVID-19 pandemic. Create at least one map that highlights which neighborhoods experienced the greatest change in electricity consumption before and after the pandemic began, and comment on your results. Explain any key assumptions you made in the analysis, or caveats about the data sources that you think the reader should be aware of. Publish all of this work in a web report.

Setting Up

The following libraries and packages are used in this analysis:

library(tidyverse)
library(ggplot2)
library(leaflet)
library(tigris)
library(sf)
library(leaflet)
library(plotly)
library(zoo)
library(knitr)

Importing Datasets

The following code imports electricity usage datasets from Pacific Gas & Electric (PG&E) between the years 2017 and 2020, located at: https://pge-energydatarequest.com/public_datasets

years <- 2017:2020
quarters <- 1:4
types <- "Electric"

pge_elec <- NULL

for(quarter in quarters){
  for(year in years){
    for(type in types){
      filename <- paste0("PGE_", year, "_Q", quarter, "_", type, "UsageByZip.csv")
      if(year == 2020 && quarter %in% 3:4){
        next
      }
      print(filename)
      temp <- read_csv(filename)
    }
    pge_elec <- rbind(pge_elec, temp)
    saveRDS(pge_elec, "pge_elec.rds")
  }
}
## [1] "PGE_2017_Q1_ElectricUsageByZip.csv"
## [1] "PGE_2018_Q1_ElectricUsageByZip.csv"
## [1] "PGE_2019_Q1_ElectricUsageByZip.csv"
## [1] "PGE_2020_Q1_ElectricUsageByZip.csv"
## [1] "PGE_2017_Q2_ElectricUsageByZip.csv"
## [1] "PGE_2018_Q2_ElectricUsageByZip.csv"
## [1] "PGE_2019_Q2_ElectricUsageByZip.csv"
## [1] "PGE_2020_Q2_ElectricUsageByZip.csv"
## [1] "PGE_2017_Q3_ElectricUsageByZip.csv"
## [1] "PGE_2018_Q3_ElectricUsageByZip.csv"
## [1] "PGE_2019_Q3_ElectricUsageByZip.csv"
## [1] "PGE_2017_Q4_ElectricUsageByZip.csv"
## [1] "PGE_2018_Q4_ElectricUsageByZip.csv"
## [1] "PGE_2019_Q4_ElectricUsageByZip.csv"
#Conversion factors to convert KWH to kBTU
pge_elec$TOTALKBTU <- pge_elec$TOTALKWH*3.412
pge_elec$AVERAGEKBTU <- pge_elec$AVERAGEKWH*3.412
pge_elec$TOTALKWH <- NULL
pge_elec$AVERAGEKWH <- NULL

pge_elec
## # A tibble: 128,319 x 8
##    ZIPCODE MONTH  YEAR CUSTOMERCLASS COMBINED TOTALCUSTOMERS TOTALKBTU
##      <dbl> <dbl> <dbl> <chr>         <chr>             <dbl>     <dbl>
##  1   93117     1  2017 Elec- Agricu~ Y                     0        0 
##  2   93117     2  2017 Elec- Agricu~ Y                     0        0 
##  3   93117     3  2017 Elec- Agricu~ Y                     0        0 
##  4   93203     1  2017 Elec- Agricu~ Y                     0        0 
##  5   93203     2  2017 Elec- Agricu~ Y                     0        0 
##  6   93203     3  2017 Elec- Agricu~ Y                   158 17200612.
##  7   93204     1  2017 Elec- Agricu~ Y                     0        0 
##  8   93204     2  2017 Elec- Agricu~ Y                     0        0 
##  9   93204     3  2017 Elec- Agricu~ Y                     0        0 
## 10   93206     1  2017 Elec- Agricu~ Y                   302 30058041.
## # ... with 128,309 more rows, and 1 more variable: AVERAGEKBTU <dbl>

Now, I import gas usage data from Pacific Gas & Electric (PG&E) between the years 2017 and 2020, located at: https://pge-energydatarequest.com/public_datasets

years <- 2017:2020
quarters <- 1:4
types <- "Gas"

pge_gas <- NULL

for(quarter in quarters){
  for(year in years){
    for(type in types){
      filename <- paste0("PGE_", year, "_Q", quarter, "_", type, "UsageByZip.csv")
      if(year == 2020 && quarter %in% 3:4){
        next
      }
      print(filename)
      temp <- read_csv(filename)
    }
    pge_gas <- rbind(pge_gas, temp)
    saveRDS(pge_gas, "pge_gas.rds")
  }
}
## [1] "PGE_2017_Q1_GasUsageByZip.csv"
## [1] "PGE_2018_Q1_GasUsageByZip.csv"
## [1] "PGE_2019_Q1_GasUsageByZip.csv"
## [1] "PGE_2020_Q1_GasUsageByZip.csv"
## [1] "PGE_2017_Q2_GasUsageByZip.csv"
## [1] "PGE_2018_Q2_GasUsageByZip.csv"
## [1] "PGE_2019_Q2_GasUsageByZip.csv"
## [1] "PGE_2020_Q2_GasUsageByZip.csv"
## [1] "PGE_2017_Q3_GasUsageByZip.csv"
## [1] "PGE_2018_Q3_GasUsageByZip.csv"
## [1] "PGE_2019_Q3_GasUsageByZip.csv"
## [1] "PGE_2017_Q4_GasUsageByZip.csv"
## [1] "PGE_2018_Q4_GasUsageByZip.csv"
## [1] "PGE_2019_Q4_GasUsageByZip.csv"
pge_gas$TOTALKBTU <- pge_gas$TOTALTHM*100
pge_gas$AVERAGEKBTU <- pge_gas$AVERAGETHM*100
pge_gas$TOTALTHM <- NULL
pge_gas$AVERAGETHM <- NULL

pge_gas
## # A tibble: 57,038 x 8
##    ZIPCODE MONTH  YEAR CUSTOMERCLASS COMBINED TOTALCUSTOMERS TOTALKBTU
##      <dbl> <dbl> <dbl> <chr>         <chr>             <dbl>     <dbl>
##  1   92304     1  2017 Gas- Commerc~ Y                     0         0
##  2   92304     2  2017 Gas- Commerc~ Y                     0         0
##  3   92304     3  2017 Gas- Commerc~ Y                     0         0
##  4   92365     1  2017 Gas- Commerc~ Y                     0         0
##  5   92365     2  2017 Gas- Commerc~ Y                     0         0
##  6   92365     3  2017 Gas- Commerc~ Y                     0         0
##  7   93203     1  2017 Gas- Commerc~ Y                     0         0
##  8   93203     2  2017 Gas- Commerc~ Y                     0         0
##  9   93203     3  2017 Gas- Commerc~ Y                     0         0
## 10   93204     1  2017 Gas- Commerc~ Y                     0         0
## # ... with 57,028 more rows, and 1 more variable: AVERAGEKBTU <dbl>

Binding the two datasets together:

pge_elec_gas <- rbind(pge_gas, pge_elec)
pge_elec_gas
## # A tibble: 185,357 x 8
##    ZIPCODE MONTH  YEAR CUSTOMERCLASS COMBINED TOTALCUSTOMERS TOTALKBTU
##      <dbl> <dbl> <dbl> <chr>         <chr>             <dbl>     <dbl>
##  1   92304     1  2017 Gas- Commerc~ Y                     0         0
##  2   92304     2  2017 Gas- Commerc~ Y                     0         0
##  3   92304     3  2017 Gas- Commerc~ Y                     0         0
##  4   92365     1  2017 Gas- Commerc~ Y                     0         0
##  5   92365     2  2017 Gas- Commerc~ Y                     0         0
##  6   92365     3  2017 Gas- Commerc~ Y                     0         0
##  7   93203     1  2017 Gas- Commerc~ Y                     0         0
##  8   93203     2  2017 Gas- Commerc~ Y                     0         0
##  9   93203     3  2017 Gas- Commerc~ Y                     0         0
## 10   93204     1  2017 Gas- Commerc~ Y                     0         0
## # ... with 185,347 more rows, and 1 more variable: AVERAGEKBTU <dbl>

Geography Setup

The following code sets up the Bay Area counties, cities, zip codes, and associated geometry:

bay_county_names <-
  c(
    "Alameda",
    "Contra Costa",
    "Marin",
    "Napa",
    "San Francisco",
    "San Mateo",
    "Santa Clara",
    "Solano",
    "Sonoma"
  )

bay_counties <-
  counties("CA", cb = T, progress_bar = F) %>%
  filter(NAME %in% bay_county_names)

usa_zips <- 
  zctas(cb = T, progress_bar = F)

bay_zips <-
  usa_zips %>% 
  st_centroid() %>% 
  .[bay_counties, ] %>% 
  st_set_geometry(NULL) %>% 
  left_join(usa_zips %>% select(GEOID10)) %>% 
  st_as_sf()

ca_cities <- places("CA", cb = T, progress_bar = FALSE)

bay_cities <- ca_cities[bay_counties, ]

bay_cities_within <-
  ca_cities %>%
  st_centroid() %>%
  .[bay_counties, ] %>%
  st_set_geometry(NULL) %>%
  left_join(ca_cities %>% select(GEOID)) %>%
  st_as_sf()

Manipulating Data

I now filter the PGE_Elec_Gas combined dataset only to Residential and Commercial gas and electricity usage corresponding to each zip code, month and year.

pge_final <-
  pge_elec_gas %>% 
  mutate(
    ZIPCODE = ZIPCODE %>% as.character()
  )%>%
  filter(
    CUSTOMERCLASS %in% 
      c(
        "Elec- Residential",
        "Elec- Commercial",
        "Gas- Residential",
        "Gas- Commercial"
      )
  ) %>% 
  select(
    !c(COMBINED, AVERAGEKBTU)
  ) %>% 
  group_by(ZIPCODE, MONTH, YEAR, CUSTOMERCLASS) %>% 
  summarize(
    TOTALKBTU = 
      sum(
        TOTALKBTU, 
        na.rm = T
      ),
    TOTALCUSTOMERS =
      sum(
        TOTALCUSTOMERS,
        na.rm = T
      )
  ) %>% 
  mutate(
    AVERAGEKBTU =
      TOTALKBTU/TOTALCUSTOMERS
  ) %>%
  right_join(
    bay_zips %>% select(GEOID10),
    by = c("ZIPCODE" = "GEOID10")
  ) %>% 
  st_as_sf() %>% 
  st_transform(4326)

pge_final$DATE <- as.yearmon(paste(pge_final$YEAR, pge_final$MONTH), "%Y %m")
pge_final
## Simple feature collection with 44668 features and 8 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -123.5335 ymin: 36.89303 xmax: -121.3083 ymax: 38.93713
## geographic CRS: WGS 84
## # A tibble: 44,668 x 9
## # Groups:   ZIPCODE, MONTH, YEAR [12,189]
##    ZIPCODE MONTH  YEAR CUSTOMERCLASS TOTALKBTU TOTALCUSTOMERS AVERAGEKBTU
##  * <chr>   <dbl> <dbl> <chr>             <dbl>          <dbl>       <dbl>
##  1 94002       1  2017 Elec- Commer~ 14147749.            701      20182.
##  2 94002       1  2017 Elec- Reside~ 18706205.          10553       1773.
##  3 94002       1  2017 Gas- Commerc~        0               0        NaN 
##  4 94002       1  2017 Gas- Residen~ 83483700            9373       8907.
##  5 94002       1  2018 Elec- Commer~ 13867518.            706      19642.
##  6 94002       1  2018 Elec- Reside~ 17254730.          10599       1628.
##  7 94002       1  2018 Gas- Commerc~ 12584900             446      28217.
##  8 94002       1  2018 Gas- Residen~ 64025800            9406       6807.
##  9 94002       1  2019 Elec- Commer~ 14407535.            717      20094.
## 10 94002       1  2019 Elec- Reside~ 17518716.          10628       1648.
## # ... with 44,658 more rows, and 2 more variables: geometry <MULTIPOLYGON [°]>,
## #   DATE <yearmon>

The total energy consumption in kBTU is ordered by month and categorized by energy type and location in the plot below:

pge_chart <-
  pge_final %>%
  ggplot() + 
  geom_bar(
    aes(
      x = DATE %>% factor(),
      y = TOTALKBTU,
      fill = CUSTOMERCLASS
    ),
    stat = "identity",
    position = "stack"
  ) + 
  labs(
    x = "Month",
    y = "kBTU",
    title = "Bay Area Monthly Energy Usage, PG&E 2017-2020",
    fill = "Energy Type"
  )

pge_final_chart <- pge_chart %>%
  ggplotly() %>%
  layout(
    xaxis = list(fixedrange = T, tickangle = -90),
    yaxis = list(fixedrange = T)
  ) %>% 
  config(displayModeBar = F)

pge_final_chart

Next, I take the PGE_Final dataset from above. To determine the effect of COVID-19 on residential energy consumption, I assume energy consumption from March to June 2019 as the baseline. The following process displays the baseline (2019) consumption and the adjusted (2020) consumption and computes the difference and percent change.

pge_bay_change <-
  pge_final %>% 
  filter(YEAR %in% 2019:2020, MONTH %in% (3:6), CUSTOMERCLASS == "Elec- Residential") %>% 
  st_set_geometry(NULL) %>%
  select(!c(MONTH, TOTALCUSTOMERS, DATE, AVERAGEKBTU)) %>%
  pivot_wider(
    names_from = YEAR,
    values_from = TOTALKBTU
  ) %>%
  rename(
    kBTU_2019 = "2019",
    kBTU_2020 = "2020"
  ) %>%
  right_join(
    bay_zips %>% select(GEOID10),
    by = c("ZIPCODE" = "GEOID10")
  ) %>% 
  st_as_sf() %>% 
  st_transform(4326)
  

pge_bay_change$kBTU_Change <- pge_bay_change$kBTU_2020 - pge_bay_change$kBTU_2019
pge_bay_change$Percent_Change <- pge_bay_change$kBTU_Change/pge_bay_change$kBTU_2019 * 100


pge_bay_change
## Simple feature collection with 1147 features and 7 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -123.5335 ymin: 36.89303 xmax: -121.3083 ymax: 38.93713
## geographic CRS: WGS 84
## # A tibble: 1,147 x 8
## # Groups:   ZIPCODE, MONTH [1,147]
##    MONTH ZIPCODE CUSTOMERCLASS kBTU_2019 kBTU_2020                  geometry
##  * <dbl> <chr>   <chr>             <dbl>     <dbl>        <MULTIPOLYGON [°]>
##  1     3 94002   Elec- Reside~ 15670057. 16308012. (((-122.3359 37.50805, -~
##  2     4 94002   Elec- Reside~ 13092936. 14838170. (((-122.3359 37.50805, -~
##  3     5 94002   Elec- Reside~ 13256889. 14308799. (((-122.3359 37.50805, -~
##  4     6 94002   Elec- Reside~ 12921855. 13501779. (((-122.3359 37.50805, -~
##  5     3 94005   Elec- Reside~  2669122.  2777757. (((-122.442 37.69435, -1~
##  6     4 94005   Elec- Reside~  2195329.  2558167. (((-122.442 37.69435, -1~
##  7     5 94005   Elec- Reside~  2207762.  2394262. (((-122.442 37.69435, -1~
##  8     6 94005   Elec- Reside~  2089263.  2255366. (((-122.442 37.69435, -1~
##  9     3 94010   Elec- Reside~ 34121552. 35412868. (((-122.4126 37.58916, -~
## 10     4 94010   Elec- Reside~ 28494011. 31997446. (((-122.4126 37.58916, -~
## # ... with 1,137 more rows, and 2 more variables: kBTU_Change <dbl>,
## #   Percent_Change <dbl>
summary(pge_bay_change)
##      MONTH       ZIPCODE          CUSTOMERCLASS        kBTU_2019       
##  Min.   :3.0   Length:1147        Length:1147        Min.   :       0  
##  1st Qu.:4.0   Class :character   Class :character   1st Qu.: 3767479  
##  Median :4.5   Mode  :character   Mode  :character   Median :12223084  
##  Mean   :4.5                                         Mean   :12716828  
##  3rd Qu.:5.0                                         3rd Qu.:19206266  
##  Max.   :6.0                                         Max.   :61917779  
##  NA's   :13                                          NA's   :20        
##    kBTU_2020                 geometry     kBTU_Change        Percent_Change   
##  Min.   :       0   MULTIPOLYGON :1147   Min.   :-46740647   Min.   :-83.957  
##  1st Qu.: 3862411   epsg:4326    :   0   1st Qu.:   125118   1st Qu.:  3.796  
##  Median :13508784   +proj=long...:   0   Median :   756224   Median :  7.503  
##  Mean   :13784902                        Mean   :  1119981   Mean   :  8.783  
##  3rd Qu.:20678221                        3rd Qu.:  1684463   3rd Qu.: 12.753  
##  Max.   :68625948                        Max.   : 16595473   Max.   : 67.176  
##  NA's   :16                              NA's   :23          NA's   :79

Mapping (Mapbox and Leaflet)

Finally, the percent differences are visually represented through the following mapping analysis:

res_pal <- colorNumeric(
  palette = "plasma",
  domain = 
    pge_bay_change$Percent_Change
)

leaflet() %>% 
  addTiles() %>% 
  addPolygons(
    data = pge_bay_change,
    fillColor = ~res_pal(Percent_Change),
    color = "black",
    opacity = 0.5,
    fillOpacity = 0.5,
    weight = 1,
    label = ~paste0(
      round(Percent_Change), 
      "% Electricity Usage Change in ",
      ZIPCODE
    ),
    highlightOptions = highlightOptions(
      weight = 2,
      opacity = 1
    )
  ) %>% 
  addLegend(
    data = pge_bay_change,
    pal = res_pal,
    values = ~Percent_Change,
    title = "Percent Change in Electricity Usage, 2019 vs. 2020"
  )

Takeaways

The first part of this assignment determines the annual cycle of Pacific Gas and Electric (PG&E) gas and electricity usage between January 2017 and June 2020 for the 9-county San Francisco Bay Area. A visual representation of the data can be found in the resulting plot above. In summary, annual energy usage peaks in the winter due to the need to heat homes and power holiday decorations. Energy usage dips in the summer despite the warmth of the season, likely because of the relatively cooler summers experienced in the Bay Area. By observation, it appears that the COVID-19 pandemic resulted in greater energy consumption than usual.

This point is numericalized in the second part of the assignment, where I compare electricity usage in the Bay Area between 2019 and 2020, specifically in the months of March and June of both years. The goal is to determine the magnitude of the increase in electricity use as a result of the shelter-in-place order issued during the COVID-19 pandemic. Here, I assumed the four months of March 2019-June 2019 as the baseline, or the energy usage in a “normal” year. I also assumed that the number of PG&E customers remained constant (or negligibly increased/decreased) during the between 2019 and 2020. I contrast these amounts with those from March 2020-June 2020.

The analysis reveals a median increase in residential electricity use of 7.5% between 2019 and 2020 in the Bay Area. One point of interest is that two of the largest increases are located in Downtown Oakland (+32%) and Downtown San Jose (+20%). This could be explained by the proportionally higher amount of high-density/transit-oriented developments in these downtown areas. With more transit-dependent residents (seniors, for example) required to stay home during the pandemic, this likely caused the larger percentage increase in electricity consumption.